Free cookie consent management tool by TermsFeed SciPy hierarchical clustering using complete-linkage | Pythontic.com

SciPy hierarchical clustering using complete-linkage

Overview:

The single-linkage algorithm and the complete linkage algorithm both work in similar ways. Both of them use the minimum distance between clusters while combining them into another cluster. The difference lies when calculating the inter-cluster distance between a cluster with more than one node and a cluster with one or more number of nodes.

In complete-linkage, to calculate the inter-cluster distance between a cluster with nodes (e1, e2) and another cluster (e3, e4) the maximum of distance(e1, e3), distance(e1, e4), distance(e2, e3) and distance(e2, e4) is selected. This is the reason for the algorithm to be called as farthest neighbour algorithm.

While neighbours are always nearby entities, the single-linkage algorithm selects the nearest neighbour and the complete linkage algorithm selects the farthest neighbour.

Choosing a neighbour during complete-linkage clustering

Example:

The Python code to perform hierarchical clustering using the complete-linkage method is same as the single-linkage example, except for the invocation of the linkage() function. The linkage() function invocation has to be modified with “complete” for the parameter “method” as given here:

cluster_linkage = hc.linkage(brain_weights_birds, method="complete")

The invocation of scipy linkage() function using the method “complete” performs agglomerative hierarchical clustering through the complete-linkage algorithm.

The data-points are the same as used in the example for hierarchical clustering through single-linkage. For the given data-points though the resultant clusters and the dendogram look the same for both single-linkage and complete-linkage methods the order in which these clusters are created and linked differ. 

Output - The linkage matrix:

(5, 4)
[[0.         1.         0.05040794 2.        ]
 [3.         4.         0.20024984 2.        ]
 [2.         6.         0.20039966 3.        ]
 [5.         8.         0.70291963 4.        ]
 [7.         9.         1.7001297  6.        ]]

Distance matrix after iteration 1:

At this iteration the pair of data-points with minimal distance between them is linked together to form the first cluster. The cluster consists of indexes 0 and 1 of the ndarray which refer to the elements Parrot and Crow.

 

Parrot

Crow

Sparrow

Seagull

Snowy Owl

Duck

Parrot

0

0.05040794

0.20039966

1.30000105

1.50002324

0.50262991

Crow

0.05040794

0

0.15013015

1.35000836

1.55007018

0.55302356

Sparrow

0.20039966

0.15013015

0

1.50004033

1.7001297

0.70291963

Seagull

1.30000105

1.35000836

1.50004033

0

0.20024984

0.8017537

Snowy Owl

1.50002324

1.55007018

1.7001297

0.20024984

0

1.00092407

Duck

0.50262991

0.55302356

0.70291963

0.8017537

1.00092407

0

Inter-cluster distance = 0.05040794.

Distance matrix after iteration 2:

At this iteration the indexes three and four are linked together to form a cluster. These indexes refer to the elements Seagull and Snowy Owl. The inter-cluster distance is 0.20024984. The number of items in the cluster is two. When compared with single-linkage algorithm the clustering order differs as the selection criteria for the neighbour changes.

 

(Parrot, Crow)

Sparrow

Seagull

Snowy Owl

Duck

(Parrot, Crow)

0

Max(0.20039966, 0.15013015) = 0.20039966

Max(1.30000105, 1.35000836) = 1.35000836

Max(1.50002324, 1.55007018) = 1.55007018

Max(0.50262991,

0.55302356) = 0.55302356

Sparrow

Max(0.20039966, 0.15013015) = 0.20039966

0

1.50004033

1.7001297

0.70291963

Seagull

Max(1.30000105, 1.35000836) = 1.35000836

1.50004033

0

0.20024984

0.8017537

Snowy Owl

Max(1.50002324, 1.55007018) = 1.55007018

1.7001297

0.20024984

0

1.00092407

Duck

Max(0.50262991,

0.55302356) = 0.55302356

0.70291963

0.8017537

1.00092407

0

Min(0.20039966, 1.35000836, 1.55007018, 0.55302356, 1.50004033, 1.7001297, 0.70291963, 0.20024984, 0.8017537, 1.00092407) = 0.20024984

Distance matrix after iteration 3:

At this iteration the cluster formed at iteration one and the index two are linked together which adds the element Sparrow to the cluster containing Parrot and Crow. The inter-cluster distance is 0.20039966 which is given by maximum(distance (parrot, sparrow), distance (crow and sparrow).

 

(Parrot, Crow)

Sparrow

(Seagull,

Snowy Owl)

Duck

(Parrot, Crow)

0

Max(0.20039966,

0.15013015) = 0.20039966

Max(1.30000105

1.35000836, 1.50002324,

1.55007018)=1.55007018

Max(0.50262991,

0.55302356) = 0.55302356

Sparrow

Max(0.20039966,

0.15013015) = 0.20039966

0

Max(1.50004033,

1.7001297) = 1.7001297

0.70291963

(Seagull,

Snowy Owl)

Max(1.30000105

1.35000836, 1.50002324,

1.55007018)=1.55007018

Max(1.50004033,

1.7001297) = 1.7001297

0

Max(0.8017537,

1.00092407) = 1.00092407

Duck

Max(0.50262991,

0.55302356) = 0.55302356

0.70291963

Max(0.8017537,

1.00092407) = 1.00092407

0

Min(0.20039966, 1.55007018, 0.55302356, 1.7001297, 0.70291963, 1.00092407) = 0.20039966

Distance matrix after iteration 4:

At this iteration index 5 is linked to the cluster formed at the iteration 3. The index of the linkage matrix corresponding to the iteration 3 is 8 as given by the fourth row of the linkage matrix. To get the actual index on the linkage matrix this value has to be subtracted from the number of data-points. Actual index = value - length(data points) 

 

(Parrot, Crow, Sparrow)

(Seagull,

Snowy Owl)

Duck

(Parrot, Crow, Sparrow)

0

Max(1.30000105,

1.50002324,

1.35000836,

1.55007018,

1.50004033,

1.7001297) = 1.7001297

Max(0.50262991,

0.55302356,

0.70291963) = 0.70291963

(Seagull,

Snowy Owl)

Max(1.30000105,

1.50002324,

1.35000836,

1.55007018,

1.50004033,

1.7001297) = 1.7001297

0

Max(0.8017537

1.00092407) = 1.00092407

Duck

Max(0.50262991,

0.55302356,

0.70291963) = 0.70291963

Max(0.8017537

1.00092407) = 1.00092407

0

Min(1.7001297, 0.70291963, 1.00092407) = 0.70291963

Distance matrix after iteration 5:

 

(Parrot, Crow, Sparrow, Duck)

(Seagull,

Snowy Owl)

(Parrot, Crow, Sparrow, Duck)

0

Max(1.30000105,

1.50002324,

1.35000836,

1.55007018,

1.50004033,

1.7001297,

0.8017537,

1.00092407) = 1.7001297

(Seagull,

Snowy Owl)

= Max(1.30000105,

1.50002324,

1.35000836,

1.55007018,

1.50004033,

1.7001297,

0.8017537,

1.00092407) = 1.7001297

0


Copyright 2025 © pythontic.com